Introduction to Paxos
Get introduced to Paxos, a popular algorithm for state machine replication.
Motivation: Log replication via consensus#
Paxos is a consensus algorithm used by multiple entities to agree on something of interest, such as maintaining consistency among multiple copies of data. As we learned in our state machine replication chapter, we can model a broad range of computation as a state machine, and replicating such a state machine at different nodes provides us fault tolerance. Commands transition a state machine from one consistent state to the next. To ensure consistency, it is necessary to establish consensus among the replicas of the state machine regarding the specific command to be executed and the order in which it should be executed. Paxos being a consensus algorithm is a natural choice in this scenario. (We will learn about Paxos in the context of state machine replication because it is generic and applicable for many use cases.)
We will use the Paxos algorithm to achieve consensus on a single value across state machine replicas. In the real world, Paxos-based solutions are used to implement replicated logs. A log can be viewed as a series of placeholders (called a slot) to save clients' commands. These logs record the commands of clients, one per slot. Paxos is agnostic about the specifics of these commands because they are application specific. These commands could be machine instructions (for example, jmp 0x12366) or keys and their associated values in a hash table or any other state machine. These logs can be viewed as sequential read-ahead files. Once consensus has been achieved on the slot's commands, these commands are executed on the local state machine. Each execution transitions a state machine from one consistent state to the next.
We use a Paxos instance to achieve consensus on a single slot of the log, as shown in the following slide deck. There can be multiple clients wanting to write their command at a specific slot of the log, and the Paxos algorithm helps them reach a consensus on whose value will be chosen in the slot. Multiple instances of Paxos can be active concurrently for different slot numbers.
1 of 9
2 of 9
3 of 9
4 of 9
5 of 9
6 of 9
7 of 9
8 of 9
9 of 9
The basic Paxos algorithm for achieving consensus for a single slot is called single-decree Paxos (or basic Paxos). When we use multiple instances of Paxos concurrently, such a setting is called multi-decree Paxos (or multi-Paxos). In this chapter, we will first explain basic Paxos in different circumstances, including normal operation and in the presence of failures. Then we will explain the working of multi-Paxos.
Many software products use Paxos algorithms. An example is Google's Spanner database which uses multiple Paxos groups. Each group manages a shard of the database and achieves consensus on how the database will change when concurrent commands/transactions are active.
Note: The original Paxos paper primarily focuses on the proofs of correctness and leaves out many implementation details. As a result, many variants of Paxos algorithms are in use in many real software products due to different interpretations of the algorithm and various tweaks made to it. In our discussion, we will adhere to the original algorithm as much as possible, with occasional pointers to enhancements.
Client interactions#
When a client has a command to execute, it submits its command to a replica server via a remote procedure call (RPC). The replica gets the command chosen at all the servers after getting the consensus. Once consensus has been achieved and the specific command has been executed by the state machine, the result of execution is returned to the client.
Failure model#
We assume a non-Byzantine failure environment for the Paxos algorithm. Network messages can be lost or delayed.
Goals for Paxos#
The goal of basic Paxos is to get the consensus on a slot's value at all the replicas by enforcing the safety and liveness properties.
Safety conditions#
To choose a value, the following safety properties must hold:
Only a single value is chosen.
The value should be chosen only from the proposed values.
None of the processes learns that a value is chosen until that value is actually chosen.
Liveness#
The goal of the Paxos algorithm is to eventually decide on a single value, choosing it from multiple proposed values., This property is known as the liveness of Paxos.
Note: Paxos algorithm is always safe but is not guaranteed to be live always. For example, if majority of replicas are not available, progress can not be made until a majority is available.
In the next lesson, we will explain the basic Paxos algorithm in detail.
Disclaimer: Our presentation of Paxos is partially influenced by the user study conducted by Diego Ongaro and John Ousterhout, who aimed to determine which consensus algorithm (Paxos or Raft) is easier to understand.
Quiz on State Machine Replication
Basic Paxos Protocol Design